-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add dryrun feature to Dynamo paths #2451
Conversation
Can you add a summary of user settings? just dump the struct? |
Also a list of ops to be run in PyTorch |
Also this (if it doesnt add to compilation time) should be added as debugging logging to all compilation calls. And if its printed at INFO level it might get masked depending on the users logging settings, I think if people are explicitly calling dry run it should be printed out in STDOUT |
0837944
to
ffbef2c
Compare
ffbef2c
to
e6302cf
Compare
- Enables building of TRT engines with "dryrun" capabilities, meaning all of the phases except conversion are run and verbose logs of the graph structure and composition are printed for the user - Improves general-purpose debug logging by printing dryrun stats to the debug logs regardless of option specification - Provides intuitive schematic of the graph engines, inputs, and code path through the course of the graph
- Fix test case failures
- Add detailed layer information for excluded ops
9b78a0e
to
3883062
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, notes for later:
We should pretty print this Compiled with: CompilationSettings(precision=torch.float32, debug=True, workspace_size=0, min_block_size=1, torch_executed_ops={'torch.ops.aten.add.Tensor'}, pass_through_build_failures=False, max_aux_streams=None, version_compatible=False, optimization_level=None, use_python_runtime=False, truncate_long_and_double=False, use_fast_partitioner=False, enable_experimental_decompositions=False, device=Device(type=DeviceType.GPU, gpu_id=0), require_full_compilation=False, dryrun=True)
Also are we able to see the intermediate torch graphs just like we see the tensorrt ones?
@narendasan - thanks for the comments. I have added an issue for the suggested improvements here: #2548. We are not able to see inputs to intermediate Torch graphs yet, because they are not always packaged as modules. Specifically, our global partitioner which uses Torch partitioning utilities, only packages the TRT engines as modules and leaves the Torch operators alone, which makes it difficult to group those into subgraphs. This improvement can be added for the fast partitioner, however, and this is noted in the new feature request issue. |
Can you not iterate across the graph and just make lists of non tensorrt ops? Even just an idea of what is left out I think is probably some of the more important information this feature could produce, since the goal is all ops are in a TRT engine, the high order bit is what is left out |
@@ -295,8 +332,19 @@ def compile_module( | |||
submodule = getattr(partitioned_module, name) | |||
# Criteria for a module to be convertible to TRT | |||
if settings.use_fast_partitioner and "_run_on_acc" not in name: | |||
dryrun_tracker.to_run_in_torch.extend(parse_non_trt_nodes(submodule)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- In
global_partitioning
-named_children
only returns TRT modules' - Can add feature for
fast_partitioning
first, then expand to global later?
@@ -341,4 +429,6 @@ def compile_module( | |||
if fast_partitioner_failed: | |||
settings.use_fast_partitioner = True | |||
|
|||
dryrun_stats_display(dryrun_tracker, settings.dryrun) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pass in graph including the Torch module/node information for proper display formatting
Description
Sample Schematic
Fixes #2081
Fixes #2413
Type of change
Checklist: